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Rate-Distortion Models for Error Resilient Video Transcoding 
Field of the Invention 

[01 ] This invention relates generally to transcoding videos, and more 
particularly to rate and distortion models for allocation of bits used to code 
the video source and bits that are applied for error resilience. 

Background of the Invention 

[02] Transmitting a video bitstream through wireless channels is a 
challenging problem due to limitations in bandwidth and a noisy channel. If a 
video is originally coded at a bit rate greater than an available bandwidth in a 
wireless channel, then the videos must first be transcoded to a lower bit rate, 
prior to transmission. Because a noisy channel can easily corrupt a quality of 
the video, there is also a need to make the encoded video bitstream resilient to 
transmission errors, even though the overall number of bits allocated to the 
bitstream is reduced. 

[03] Two primary methods used for error-resilience video encoding are 
resynchronization marker insertion and intra-block insertion (intra-refresh). 
Both methods are effective at localizing errors. If the errors are localized, then 
recovery from errors is facilitated. 

[04] Resynchronization inserts periodic markers so that when an error 
occurs, decoding can be restarted at a point where the last resynchronization 
marker was inserted. In this way, errors are spatially localized. There are two 
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basic approaches for inserting synchronization markers: a group-of-block 
(GOB) based approach, which is adopted in the H.261/H.263 standard, and a 
packet-based approach, which is adopted in the MPEG-4 standard. 

[05] In the GOB-based approach, a GOB header is inserted periodically 
after a certain number of macroblocks (MBs). In the packet-based approach, 
header information is placed at the start of each packet. Because the way the 
packets are formed is based on the number of bits, the packet-based approach 
is generally more uniform than the GOB-based approach. 

[06] While resynchronization marker insertion is suitable to provide a 
spatial localization of errors, the insertion of intra MBs is used to provide a 
temporal localization of errors by decreasing the temporal dependency in the 
encoded video bitstream. 

[07J A number of error resilience video encoding methods are known. In 

"Error-resilient transcoding for video over wireless channels," IEEE Journal 
on Selected Areas in Communications," vol. 18, no. 6, pp. 1063-1074, 2000 
by Reyes, et al., optimal bit allocation between error resilience insertion and 
video encoding is achieved by modeling the rate-distortion of error 
propagation due to channel errors. However, that method assumes that the 
actual rate-distortion characteristics of the video are known, which makes 
the optimization difficult to realize practically. Also, that method does not 
consider the impact of error concealment. 

[08] In "Optimal mode selection and synchronization for robust video 
communications over error-prone networks," IEEE Journal on Selected 
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Areas in Communications, vol. 18, no. 6, pp. 952-965, 2000 by Cote, et al., 
the optimal error resilience insertion problem is divided into two sub- 
problems: optimal mode selection for MBs; and optimal resynchronization 
marker insertion. That optimization is conducted on an MB basis and inter- 
frame dependency is not considered. 

[09J Another method described by Zhang, et al., "Video coding with 
optimal inter/intra-mode switching for packet loss resilience," IEEE Journal 
on Selected Areas in Communications, vol. 18, no. 6, pp. 966-976, 2000, 
determines recursively a total decoder distortion with pixel-level precision to 
account for spatial and temporal error propagation in a packet loss 
environment. That method attempts to select an optimal MB encoding mode. 
That method is quite accurate on the MB level when compared with other 
methods. However, that method does not consider the inter-frame 
dependency and the optimization is only conducted on the current MB. 

[010] Dogan, et al. describe a video transcoding framework for general 
packet radio service (GPRS) in "Error-resilient video transcoding for robust 
inter-network communications using GPRS," IEEE Transactions on Circuits 
and Systems for Video Technology, vol. 12, no. 6, pp. 453-464, 2002. 
However, the bit allocation between inserted error resilience and the video 
encoding is not optimized in that method. 

[Oil] For video distortion caused by channel errors, a low complexity video 
quality model has been described by Reibman et al., in "Low-complexity 
quality monitoring of MPEG-2 video in a network," in Proceedings IEEE 
International Conference on Image Processing, September 2003. However, 
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the measurement to determine error propagation effects is only based on the 
received bitstream. One of the most important aspects that is not fully 
considered by that method is the issue of inter-frame dependency, which is a 
key factor in motion compensated video encoding. Often, bit allocation and 
encoding mode selection are optimized only for the current MB or the 
current frame. 

[012] It is desired to provide an optimal solution that reduces the video bit 
rate while maintaining error resilience. It is also desirable to have models 
tiiat accoimt for inter-frame dependency, which is inherit to many coding 
schemes, and also accurately account for the propagation of errors at the 
receiver. This is especially important when a video bit stream is transferred 
from a channel with a high bandwidth and a low bit-error-rate (BER), for 
example, a wired channel, to a channel with a low bandwidth and a high 
BER, for example, a wireless channel. For such a low bandwidth channel, 
the combined task of bit rate reduction and error resilience insertion is 
essential because the bit rate reduction needs to be balanced against the 
additional error resihence bits. 

Summary of the Invention 

[013] The invention provides accurate rate-distortion (R-D) models for 
transcoding videos. One model describes the rate-distortion characteristics for 
requantizating a video considering inter-frame dependencies. Other models 
estimate the distortion relationship for error propagation in a motion 
compensated video and characterizes the rate for intra-block and 
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resynchronization marker insertion. These models are used for optimal bit 
allocation schemes for video transcoding. 

Brief Description of the Drawings 

[014] Figxire 1 is a block diagram of rate-distortion models and a 
transcoding method according to the invention; 

[015] Figure 2 is a block diagram of a video transcoder according to the 
invention; 

[016] Figure 3 is a block diagram of a video system according to the 
invention; 

[017] Figure 4 is a block diagram of a spatial concealment method used by 
the invention; 

[018] Figures 5 and 6 are block diagrams of decomposing distortion for I- 
and P-frames of a video caused by channel errors; 

[019] Figure 7 is a graph comparing resynchronization marker insertion 
accuracy; and 

[020] Figure 8 is a graph comparing intra-block insertion accuracy. 
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Detailed Description of the Preferred Embodiment 

[021] As shown in Figure 1, the invention provides a method for transcoding 
100 an input video bitstream 101 so that a bit rate in an output bitstream 102 
is reduced while maintaining error resilience under a given bit rate constraint 
and channel condition. The method 100 subjects to input video to three rate- 
distortion (RD) models: a video source requantization model 1 1 1, an intra- 
block refresh model 1 12, and a resynchronization marker model 113. The 
outputs of the three models are input to a bit allocation control module 120, 
which determines a quantization parameter 121, a resynchronization marker 
rate 122 and an intra-block refresh rate 123. These parameters are used by a 
transcoder 130 to form the output bitstream 102. 

[022] The three models are novel in that inter-frame dependency is included 
in both a video source model and an error resilience model. In addition, the 
error resilience model in the transcoding considers error concealment at the 
receiver. 

[023] The invention also provides an alternative embodiment of the 

transcoding method that achieves near-optimal performance at a lower 
complexity. 

[024] Transcoder Structure 

[025] Figure 2 shows a transcoder 200 according to the invention. The 
transcoder includes a decoder 210 and an encoder 220. The decoder 210 takes 
an input video bitstream 101 at a first bit rate. The encoder produces an output 
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bitstream 102 at a second bit rate. In a typical application, the second bit rate 
is less than the first bit rate. 

[026] The decoder 2 1 0 includes a variable length decoder (VLD) 2 1 1 , a first 
inverse quantizer (Q*'i) 212, an inverse discrete cosine transform (IDCT) 213, 
a motion compensation (MC) block 214, and a first fi*ame store 215. 

[027] The encoder 220 includes a variable length coder (VLC) 221, a 
quantizer (Q2) 222, a discrete cosine transform 223, a motion compensation 
(MC) block 224, and a second fi-ame store 225. The transcoder also includes a 
second inverse quantizer (Q"'2) 226 and a second IDCT 227. 

[028] In addition, the encoder includes an intra/inter switch 228 and a 
resynchronization marker insertion block 229. 

[029] The bit allocation 120 of Figure 1 provides the quantization parameter 
121 to the quantizer 222, the resynchronization marker rate 122 to the 
resynchronization marker insertion block 229 and the intra-block refi^esh rate 
123 to the intra/inter switch 228. 

[030] Problem Statement 

[031] It is an object of the invention to minimize an end-to-end distortion of 
the encoded video bitstream subject to rate constraints. An overall rate budget 
is allocated among the three different components that contribute to the rate, 
i.e., video source requantization, resynchronization marker insertion, and 
intra-refresh. 
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[032] To achieve this object the three distinct components, the video source 
requantization model, the intra-refresh model, and the resynchronization 
marker insertion model are described. The later two model error-resilience. 
Although there is some degree of dependency among these three components, 
each component has a unique impact on the R-D characteristics of the 
transcoded video under different channel conditions. 

[033] The video source model accounts for the R-D characteristics of the 
video bitstream without resynchronization markers or intra-refresh insertion, 
while the error-resilience models accounts for the R-D characteristics of intra- 
block insertion and resynchronization marker insertion. 

[034] Although the separation of the error resilience model from the video 
source model is an approximation, it turns out to be quite accurate for the R-D 
optimized bit allocation scheme according to the invention. 

[035] The problem is formally stated as follows. A target bit rate constraint is 
Rt. a total distortion is D, which is measured as a mean squared error (MSB). 
Given these parameters, it is desired to minimize the distortion, subject to the 
target rate constraint, i.e., to solve 
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[036] where dk is the distortion caused by each of the three components ke^ K 
for k=\,2,'i,rk'\s the rate of each component and cok are the specific 
parameters used in the allocation, e.g., quantization parameters, 
resynchronization marker spacing, and intra refi^esh rate. 

[037] One way to solve the above problem is through a Lagrangian 
optimization approach in which the following quantity is minimized: 

K K 

fc=i k=i ^2) 

[038] where X is the Lagrangian multiplier to be determined during the 
optimization. A bisection process can be used to obtain the optimal multipher 
used to solve this problem. However, that process is iterative and 
computationally expensive. Also, obtaining accurate R-D sample points 
required by the optimization procedure is still an open issue. 

[039] It is preferred to use a distinct R-D model for each of the three 
components so that the optimization does not have to obtain the actual R-D 
values from simulation. With these models, some of the computational burden 
for solving the above problem is alleviated. However, this solution is 
relatively complex. Therefore, an altemative method that can solve the bit 
allocation problem with similar performance, but with a much lower 
complexity, is sought and described as part of this invention. 
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[040] Video Source Requantization Model 

[041] Our R-D model for a coded video source operates on groups-of-frames 
(GOP). This accounts for inter-frame dependency by considering the 
requantization distortion in the current frame that propagates to the next frame 
through motion compensation. The R-D model is then modified accordingly 
for the next frame to accoimt for this error propagation effect. 

[042] If a composite signal, such as the output video 102 is decomposed into 
independent components, i.e., the requantized video, the resynchronization 
markers, and the intra-refresh blocks, then a composite R-D model can be 
derived directly from the three individual R-D models. Furthermore, if the 
signal can be decomposed into independent identically distributed (i.i.d.) 
Gaussian sources with energy compact transforms, such as the DCT, then the 
total distortion D of the signal caused by the encoding can be modeled as: 

■1=0 (3) 
[043] where L is the total number of frequency coefficients in the case of 
DCT, <I>(tt>/) is the power spectrum density function of coefficient /, is the 
bit rate of the signal, and a constant parameter is 21n2. An interesting 
observation from this result is that the exponential function of rate is 
proportional to the product of the coefficient variances rather than the sum of 
variances. 

[044] The above model is only accurate for Gaussian sources with fine 
quantization. It is known that a video source can be characterized more 
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accurately by a generalized Gaussian model. Furthermore, a video source 
often needs to go through coarse requantization during transcoding to adapt to 
lower bandwidth constraints. 



[045] The following modifications are made to the model to accommodate 
these two issues. First, the parameter p is made variable, rather than a fixed 
value, and second, R{D) is replaced by BS'iD). 

[046] Furthermore, if the value [Ili=o ^(^i)] is replaced by a^, the 
total variance of the signal, then 
u -a e . 

[047] Experimental data indicate that p is usually in the range of [1, 10], and 
y is in the range of [0, 1]. Then, for requantizing intra-coded firames, the 
distortion is 

[048] where Dq is the distortion of the intra-coded frame caused by 
requantization, and Rq is the rate. The intra-coded variance, al , can be 
estimated in the frequency domain. 



[049] It is possible to estimate the model parameters ^ and y fi'om two sample 
points on the R-D curve, as described herein. 



[050] Without considering inter-fi'ame dependency, a similar model can be 
used for inter-coded fi-ames: 

Dfc = a^e-^'=^S A:=l,2,...,iV-l, 
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(0511 where N is the total number of frames in a GOP, Dk is the distortion of 
the inter-coded frame caused by requantization, Rk is the rate and (f^k is the 
variance of the input signal. Again, the model parameters ^ and y can be 
estimated from two sample points on the R-D curve. 

[052] The inter-frame dependency is modeled by changing the frame variance 
o^ifc to a^k 

Dk = afe-^"^^^ = [al + a,£>;t-i)e-'^^^S = 1, 2, . . . , .V - 1, (7) 

[053] where a*^k = <^k'^ 0LkDk.\ denotes the inter-frame variance, and Dk-i 
denotes an extra quantization residue error produced when the previous frame 
is requantized with a larger Q-scale, and ak denotes a propagation ratio, which 
is determined by the amount of motion compensation. The term akDk.\ models 
the dependency between the current and the previous frame. This term 
captures the quantization error propagation effect caused by motion 
compensation. That is, when the previous frame is quantized coarsely, more 
quantization error propagates to the current frame through motion 
compensation. 

[054] Model Parameter Estimation 

[055] Parameter estimation for the proposed R-D models is performed in two 
stages on a GOP-basis. In the first stage, all the frames in the GOP are 
requantized with multiple sample quantization scales, e.g., 4, 8, 31. For the P- 
frames, no motion compensation is performed. Using the three sample R-D 
points, the three parameters a 0, Pq, and yo are determined from Equation (5) 
that establish the model for I-frame. Similarly, the parameters <7^*, and 
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are estimated from Equation (6) that establish the model for P-frame without 
taking the propagation effect into account, i.e., the (t^* that is estimated here 
denotes the variance of the input signal. 



[056] The second stage takes care of propagation effects in the model 
parameter estimates for the P-frames by determining ot*. To do this, first 
requantize the I-frame at a different quantization scale than used in the first 
stage, e.g., Qi=\4. Second, requantize the P-frames at a different 
quantization scale while performing motion compensation to account for the 
propagation effects. With one san^le point in a P-frame, the parameter a^k 
can be estimated from Equation (7). Then, from Equation (7), where a'^k-<^k 
+ akDk-u determine at by: 



Oik = 



Dk-i ' (8) 
[057] where Dk.j is the distortion of the previous frame. 



[058] The parameters yk and ak are relatively constant within a given 
sequence. Therefore, it is sufficient to estimate these parameters only once at 
the start of a sequence, or if a scene change is detected. For parameters that 
are more sensitive to the scene content, e.g., dk and /3k, their values are 
updated for each frame. The advantage of this simplification is that after 
and Gk are estimated at the start, the transcoding only needs to be performed 
once to determine the model parameters, instead of twice. The parameter {a 
^k} is estimated from the variance of the DCT coefficients as expressed in 
Equation (4), and {fik} is estimated from one R-D sample point, which is 
easily obtained by requantizing the current frame. 
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[059] Error-Resilience R-D Models 

[060] This section describes the second and third rate-distortion models that 
improve error-resilience, i.e., resynchronization marker insertion and intra- 
block refresh. First, a transmission environment is described, including the 
system structure, type of channel, and methods of error concealment. Then, 
the distortion models for resynchronization and intra-block insertion (intra- 
refresh) are described. Here, the focus is on the distortion models, because the 
rate estimates are obtained in a rather straightforward manner. Specifically, 
the rate consumed by resynchronization markers can be determined from the 
number of bits in the resynchronization header and the resynchronization 
marker spacing, while the rate consumed by intra-refresh can be determined 
from the intra-refresh rate and the average rate increase by replacing an inter- 
coded MB with an intra-coded MB. 

[061] System Structure 

[062] Figure 3 shows a system 300 for transmitting and receiving a video 
bitstream via a noisy channel. Audio data 301 is generated and multiplexed 
with encoded video data 302. The data are transmitted 310 according to the 
H,324M standard defined for a typical mobile terminal, and an AL3 
TransMux defined in Annex B of the H.223 standard. A 16-bit and an 8-bit 
cyclic redundancy code (CRC) are used for error detection in the video and 
audio payloads, respectively. For video packetization, a packet structure 
described in the MPEG-4 resilience tool is used. This structure provides 
resynchronization at approximately the same number of bits. In this way, a 
typical video packet has seven bytes overhead in total, including two bytes for 
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control, three bytes for header, and two bytes for the CRC checksum. A 
maximum video packet payload length is 254 bytes. 

[063] A wireless channel 320 is represented according to a binary symmetric 
channel (BSC) model, which assumes independent bit error 321 in a 
bitstream. For error detection, recovery and concealment in the video receiver 
330, it is assumed that after an error is detected, either by a CRC checksum or 
by a video syntax check, the entire video packet containing the error is 
discarded, and the lost MBs are concealed. This is done to avoid disturbing 
visual effects caused by decoding erroneous packets. The receiver recovers 
the audio signal 303 and the video signal using a video decoder 304. 

[064] Other errors that can be detected include illegal VLC, semantic error, 
excessive DCT coefficients (> 64) in a MB, and inconsistent 
resynchronization header information, e.g., QP out of range, MBA(A:) < 
MBA(^-1), etc. The error is recovered by resynchronizing to the added packet 
resynchronization markers or to the fi-ame headers. 

[065] For error concealment, both spatial and temporal error concealment 
methods are employed, using a simple block replacement scheme. 

[066] As shown in Figure 4, a spatial concealment method is employed for a 
lost MB 401 in an intra-coded frame. The concealment is performed by 
copying the MB from its immediate upper neighbor 402. 

[067] Similarly, temporal concealment is employed for a lost MB 410 in an 
inter-coded frame. Here, the motion vector 414 of the lost MB 410 is set to be 
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the median of the motion vectors selected from three specific neighbors, i.e., 
blocks labeled a 41 1, b 412, and c 413 as shown in Figure 4. The MB in the 
previous frame 415 that this motion vector is referencing is copied to the 
current location to recover the lost block 410. 

[068] It is noted that the error-resilience models described in this invention 
also apply to other prior art error concealment schemes as well. 

[069] Overall Distortion from Channel Error 

[070] Figures 5 and 6 show the decomposition of the overall distortion for I- 
and P-frames caused by channel errors. A rectangle 501 denotes the set of all 
the MBs in an I-frame, while a rectangle 601 denotes the set of all MBs in a 
P-frame. 

[071] For I- frames, distortion comes from lost intra-coded MBs (LS) 502, 
which are spatially concealed. For P-frames, distortion comes from two parts: 
distortion from lost MBs (L) 602, and distortion propagated from previous 
corrupted MBs through motion compensation, which are referred to as MC 
MBs 603. The lost MBs can be ftirther decomposed into two categories: inter- 
coded MBs (LT) 604 lost and concealed with temporal concealment, and 
inter-coded MBs (LTC) 605 lost and concealed with temporal concealment, 
but the replacement themselves were corrupted. Note that LTC MBs define 
the intersection of L MBs and MC MBs. The MCC MBs 606 refer to the MBs 
that are received correctly, but reference the previous corrupted MBs through 
motion compensation. 
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[072] If the number of MBs lost in a frame is 7/, the number of MBs 
corrupted through motion compensation is y^c, and the total number of MBs 
in a frame is M, then the average number of corrupted MBs in a frame E[Y] 
can be expressed as: 

E[Y] = E[Yi] + - E[Yi,,l (9) 

where Yi,c = 1/0 ^mc- This intersection is proportional to the number of lost 
MBs and the number of inter-coded MBs corrupted through motion 
compensation, and subsequently, 

Biyu.\=E[Y,f)Yn^\^sytm^, ^^^^ 

[073] and the total average distortion, measured in MSB, can therefore 
calculated by: 



D = 



for I-frmne 

^ {£[¥(,.] • Dt + E[Y,tc] ■ Dtc + E[Yr„a^ ■ -Dmc) for P - frame, 



(11) 

[074] where A is the average spatial concealment distortion, A is the 
average temporal concealment distortion when copying a correct MB from 
the previous frame, Dtc is the average temporal concealment when copying a 
corrupted MB from the previous frame, and D^c is the average distortion of 
correctly received MBs referencing corrupted MBs through motion 
compensation. The number of MCC MBs is 7«cc as shown in Figure 5. 



[075] Techniques to determine each quantity in the above equation are 
described below. There are two categories of quantities: distortion related to 
concealing lost MBs, and distortion related to error propagation as a result of 
motion compensation. 
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[076] Error Concealment Distortion 

[077] The probability pi that one MB is lost in a video frame n can be 
modeled by the probability psi that a video packet is lost. If the channel bit 
error rate (BER) is Pg, and an average video packet length in bits is then 
Pi=Psi=\-{\-Pe)Ls- (12) 

[078] It follows that the average number of lost MBs E[Yi(n)] in frame n is 
Pi 'M. The distortion caused by losing one MB can be calculated according 
to one of the three situations: 

the loss of an intra-coded MB that is spatially concealed resulting 

in distortion Ds, 

the loss of an inter-coded MB that is temporally concealed by 
copying a non-corrupted MB from the previous frame resulting in 
distortion Dt, and 

the loss of an inter-coded MB that is temporally concealed by 
copying a corrupted MB from the previous frame resulting in 
distortion D,c 

[079] The values Ds and A can be estimated by calculating pixel differences 
between the lost MB and the replacement MB. The value Ac can be 
approximated by an addition of motion compensation corruption to A, e.g., 
Ac = A + i)«c. 
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[080] Error Propagation Distortion 

[081] A Markov model can be used to estimate error propagation by motion 
compensation. The reason for using the Markov model is because the 
number of corrupted MBs in the current frame through motion compensation 
only depends on the motion vectors in the current frame and the number of 
corrupted MBs in the previous frame. The probability that a single MB is 
corrupted through motion compensation can be determined by: 

Pmc = pOi + [1 - (1 - pf]e2 + [1 - (1 - P)']es, (13) 

[082] where p is the probability of one MB being corrupted in the previous 
frame, 6i denotes the proportion of MBs in the current frame that reference a 
single MB, 62 denotes the proportion of MBs that reference two MBs, and 0^ 
denotes the proportion of MBs that reference four MBs in the previous 
frame. If the proportion of intra-coded MBs is denoted 7, then 6\ + 02 + 0^ + 
7=1. From this relation, it is clear that a higher value of rj yields a lower 
value of p„ 



[083] Then, a probability transition matrix that characterizes the error 
propagation through motion compensation can be calculated by: 



I \ 

M 



y Jmc I 



(14) 



[084] where jmc is the number of MBs corrupted through motion 
compensation in frame «, / is the total number of MBs corrupted in frame n - 
1. An n-step probability transition matrix P" is: 
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fc=i (15) 

where 

P"0-,y«.c) = P{Yn,c(n) =7 jnO) = /}■ (16) 

[085] P* is the 1-step Markov transition matrix for frame k. The average 
number of corrupted MBs through motion compensation in frame n can be 
obtained by 

[086] where po(i) is the probability of i MBs being corrupted in the first 
frame. 

[087] The above model is computationally complex, and is therefore 

simplified using a 1-step Markov model instead of an /z-step Markov model, 
and use E[Y(ny\ to replace i in Equation (14). Therefore, Equation (17) 
becomes 

E{Y„a}=M-p„,. (18) 

[088] It follows that the average distortion due to motion compensation at 
frame n can be expressed by 

DUn)=p-{\-ri)-D{n-\), (19) 
[089] where D(n - 1) is the average distortion of frame n - 1. 

[090] Model Accuracy 

[091] Figure 7 compares the accuracy of the R-D model for 
resynchronization marker insertion as a fimction of marker spacing or video 



20 



MERL-ISIO 
Vetro « al. 

packet length. The rate change of inserted resynchronization markers comes 
from the change of marker spacing or packet length in a range of [130, 1300] 
bits. The test is performed with a channel BER = 10"^. 

[092] Figure 8 shows a test of the intra-refresh R-D model as a function of 
intra-refresh rate. The intra-refresh rate varies from 2% to 90%. From these 
figures, it can be seen that the error-resihence models according to the 
invention predict accurately the actual distortion. 

[093] BitAllocatioii 

[094] Based on the above described R-D models for video source 
requantization, resynchronization marker insertion, and intra-refresh, it is 
now possible to solve the R-D optimized bit allocation problem. Then, the 
resulting optimal source R-D curve can be used in the overall bit allocation 
for error resilient coding. Based on the overall optimal bit allocation scheme, 
a sub-optimal scheme to enable transcoding with lower complexity, but 
achieving similar performance, is described. 

[095] Optimized Rate Allocation- Source Requantization Only 

[096] With the R-D model for video source requantization, optimal bit 
allocation 120 can be achieved for a given rate budget R. Specifically, a 
solution to the following problem is sought: 
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mill J^Dk 

k 

subject to ^i?,jfc < R and Rki <Rk<Rku k = 0,l,...,N -1 

k (20) 

where Ru and R^u are lower and upper bound of the achievable rate for the A:* 

frame. 

[097] For an I-frame, Ru and Rh, can be determined by the minimum and 
maximum allowable quantization scale. For a P-frame k, R^i is achieved by 
assigning a minimum quantization scale to all its previous frames (0 to ^-1), 
and the maximum allowable quantization scale to the current frame. On the 
other hand, Rku is obtained by assigning a maximum allowable quantization 
scale to all its previous frames and the minimum quantization scale to the 
current frame. In practice, R^, can be estimated by coding all the MBs in the 
current frame with intra mode. 

[098] There are several known methods to solve the above optimization 
problem, e.g., a dynamic programming approach based on the Lagrange 
multiplier and a trellis. The problem with that approach is that as the number 
of fr^es increases, the trellis grows exponentially and the size of the 
problem quickly becomes intractable. Another issue is that the Lagrange 
multiplier needs to be determined by traversing the trellis tree iteratively, 
which ftirther complicates the problem. An alternative approach incorporates 
a penalty fimction into the minimization problem. However, that iterative 
approach is relatively complex. Both approaches assume that the actual R-D 
values at various operating points are readily available, which may not be 
the case in practical applications. 
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[099] The method according to the invention is based on a projected 
Newton method, see Bertsekas, "Projected Newton methods for optimization 
problems with simple constraints," Tech. Rep. LIDS R-1025, MIT, 
Cambridge, MA, 1980, incorporated herein by reference. 

[0100] In order to use that method, the problem in Equation (20) needs 

to be modified. First, an optimal minimum distortion occurs when I,kRk = K 

i.e., the optimal solution always uses the entire available bit budget. Second, 

it is practical to achieve a lower bit budget, most of the time. Therefore, the 

rate upper bound R^u is exceeded rarely. Thus, the upper bound can be 

eliminated. Given this, the new constrained problem is written as: 

min ^Djfe 
k 

subject to ^i?fc = i?* and Rl>0 fc = 0,l,...,iV-l 

' (21) 
[0101] where the lower bound Ru is eliminated by substituting R/c with 
R*k + Rki,y^herGR*=R-PkRki. 

[0102] One advantage of this method is that no additional parameters 
need to be introduced, such as a Lagrangian multiplier. The constraints are 
handled implicitly within the method by variable substitution and linear 
projection. Therefore, this method is comparable to its unconstrained 
counterpart. Another advantage of the method is that it uses Hessian 
information to improve the convergence. Therefore, the resulting Newton- 
like method has a typically superlinear rate of convergence and is 
considerably faster than prior art methods. With this method, the size of the 
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problem can be increased considerably without increasing the computational 
time. 

[0103] R-D Derivative Equalization 

[0104] To provide a low-complexity implementation for the bit 
allocation, a technique to determine a suboptimal operating point is 
described. This technique is basically an R-D derivative equalization 
scheme. This scheme is based on the fact that optimal bit allocation is 
achieved at the point where the slopes of the R-D function for each 
component are equalized, i.e., made substantially the same. 

[0105] Starting from an operation point close to an optimal point, the 
objective is to continually adjust the operating point in the direction of the 
optimal point. To achieve this, there are two steps: 

start from an operational point close to the optimal point, and 
move towards an optimal point and remain at that point, given 
changes in video content and channel conditions. 

[0106] The first step is not very difficult because the initial 
optimization only needs to be done with the first GOP. The second step uses 
the following R-D derivative equalization scheme. Specifically, examine a 
local derivative of each R-D curve ajid adjust the bits allocated to each 
component accordingly. If the rate budget is constant, then reallocating a 
change in rate AR from the component with a smallest absolute derivative 
value to the component with a largest absolute derivative value is a good 
approximation to the optimal solution. 
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[0107] Bit Allocation Strategy 

[0108] In order to evaluate the rate allocation strategy as described 
above, the following ancillary models are provided. The number of multiple 
transcoding components is N, with component i operating at bit rate Ri and a 

distortion Z)/. The total distortion is given by Z) = 2^ » ^nd a total rate 

is given by Xw^f • assume that all R-D functions are convex, and 

dDi I dRi < 0, for all / = 1, ., iV. 

[0109] In one interpretation of the problem, we are given an additional 
rate Ai? > 0. The goal is to allocate among the components so that the total 
distortion D is maximally decreased. If A/J is relatively small, then the total 
change in distortion, A£>, can be expressed as: 

-■^^ I^I^O ^^ = ^- (22) 

[0110] In the above equation, the derivative dDi I dRi is replaced by the 
highest absolute value of derivative dDk/dRk, because dDi I dRi < 0. 
Therefore, the allocation scheme that best minimizes AD, or maximizes | AD 
I, because AZ)< 0, allocates all the additional bits to component k. 

[0111] In a second interpretation of the problem, we decrease the total 
rate R by AR. In this case, AD can be expressed as: 
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' - dRi dRi 

2=1 T=l 

where |^| < |^| axid ^<0 Vz = l,...,Ar. 

ClJtii CllVi ft-tii. (23) 

[0112] In the above equation, the derivative d!D, / dRi is replaced by the 
lowest absolute value of derivative dDi I dRi. Therefore, the best bit 
allocation scheme that minimizes AD, decreases the rate of component / by 
A/?. 

[01 13] In a third interpretation of the problem, we reallocate bits 
among the transcoding components without increasing or decreasing the 
total rate. To achieve this, we increase the rate of some components. We 
denote this group with current operation rate Rik and distortion A*, wherer ik 
€ [1, i/]. We also decrease the rate of the remaining components. We denote 
this group with current operation rate and distortion A/, where il g [1 , 
N\). The rate increase A/J/yt, and the rate decrease A/2,/ should satisfy the three 
conditions below: 

(i) Ai?ifc = Ai?, (ii) Aiii/ = -Ai?, (iii)Ai2>0, 

Aflit>0 Aftii<0 

(24) 

[0114] where A/? is the total rate adjustment. Then, the total change in 
distortion can be expressed as: 
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(25) 



[01 15] From the above equation, it can be seen that the optimal bit 
reallocation scheme to minimize distortion should be the one that deducts 
AR only from the component with the smallest absolute derivative value, 
and adds AR only to the component with the largest absolute derivative 
value. 



[0116] An additional point that needs to be addressed here is the 
optimal value of AR. Because the value order of the derivatives dDi I dRi for 
i=\, ...,N should not change, we select the largest possible value that keeps 
Eqs. (22), (23) and (25) valid. 

[0117] This method has a lower cost than the global optimal method. 
The entire R-D curve for each encoding component is not required. In this 
embodiment, two local sample points on the R-D curve can be used to 
perform a discrete differentiation. 



[0118] Sub-Optimal Bit AUocatioii Procedure 



[0119] The following procedures are implemented to facilitate a low- 
complexity transcoding operation. For the first GOP of the video sequence, 
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the model parameters are estimated and the R-D models for the video source 
requantization, resynchronization marker insertion and intra-refresh are 
estabhshed. 

[0120] Then, optimal bit allocation can be achieved for this GOP 
through Lagrangian optimization process as described above. For each 
subsequent GOP, simplified parameter estimation procedures are used to 
generate two local operation points. Then, a local derivative is obtained by 
discrete differentiation. If local derivatives of the three R-D curves are equal, 
then the current bit allocation is retained. Otherwise, the bit allocation of the 
component with the largest absolute value local derivative is increased, and 
decrease the bit allocation of the component with the lowest absolute value 
local derivative. 

[0121] Effect of the Invention 

[0122] The invention provides rate-distortion D models that consider 
inter-frame dependency for optimal bit allocation in error resilient video 
transcoding. A sub-optimal scheme achieves similar performance with much 
lower complexity. Overall, the method according to the invention with 
variable bit allocation has superior performance compared to error-resilient 
transcoding schemes with fixed bit allocation. 

[0123] Although the invention has been described by way of examples 
of preferred embodiments, it is to be understood that various other 
adaptations and modifications may be made within the spirit and scope of 
the invention. Therefore, it is the object of the appended claims to cover all 
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such variations and modifications as come within the true spirit and scope of 
the invention 
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